Overview

Dataset statistics

Number of variables27
Number of observations1852394
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 GiB
Average record size in memory958.0 B

Variable types

Categorical13
Numeric13
Boolean1

Alerts

trans_date_trans_time has a high cardinality: 1819551 distinct valuesHigh cardinality
merchant has a high cardinality: 693 distinct valuesHigh cardinality
first has a high cardinality: 355 distinct valuesHigh cardinality
last has a high cardinality: 486 distinct valuesHigh cardinality
street has a high cardinality: 999 distinct valuesHigh cardinality
city has a high cardinality: 906 distinct valuesHigh cardinality
state has a high cardinality: 51 distinct valuesHigh cardinality
job has a high cardinality: 497 distinct valuesHigh cardinality
dob has a high cardinality: 984 distinct valuesHigh cardinality
trans_num has a high cardinality: 1852394 distinct valuesHigh cardinality
zip is highly overall correlated with long and 2 other fieldsHigh correlation
lat is highly overall correlated with merch_lat and 1 other fieldsHigh correlation
long is highly overall correlated with zip and 2 other fieldsHigh correlation
unix_time is highly overall correlated with first_time_at_merchantHigh correlation
merch_lat is highly overall correlated with lat and 1 other fieldsHigh correlation
merch_long is highly overall correlated with zip and 2 other fieldsHigh correlation
amt_month is highly overall correlated with amt_month_shopping_net_spend and 1 other fieldsHigh correlation
amt_month_shopping_net_spend is highly overall correlated with amt_month and 1 other fieldsHigh correlation
count_month_shopping_net is highly overall correlated with amt_month and 1 other fieldsHigh correlation
state is highly overall correlated with zip and 4 other fieldsHigh correlation
first_time_at_merchant is highly overall correlated with unix_timeHigh correlation
is_fraud is highly imbalanced (95.3%)Imbalance
amt is highly skewed (γ1 = 40.81280918)Skewed
trans_date_trans_time is uniformly distributedUniform
trans_num is uniformly distributedUniform
trans_num has unique valuesUnique
amt_month_shopping_net_spend has 276206 (14.9%) zerosZeros
count_month_shopping_net has 276206 (14.9%) zerosZeros

Reproduction

Analysis started2024-04-09 19:01:33.958561
Analysis finished2024-04-09 19:05:06.989165
Duration3 minutes and 33.03 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

trans_date_trans_time
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct1819551
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size134.3 MiB
2020-06-02 12:47:07
 
4
2019-04-22 16:02:01
 
4
2020-10-05 19:37:49
 
4
2020-06-01 01:37:47
 
4
2020-12-17 20:36:39
 
4
Other values (1819546)
1852374 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters35195486
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1787218 ?
Unique (%)96.5%

Sample

1st row2019-01-01 00:00:18
2nd row2019-01-01 00:00:44
3rd row2019-01-01 00:00:51
4th row2019-01-01 00:01:16
5th row2019-01-01 00:03:06

Common Values

ValueCountFrequency (%)
2020-06-02 12:47:07 4
 
< 0.1%
2019-04-22 16:02:01 4
 
< 0.1%
2020-10-05 19:37:49 4
 
< 0.1%
2020-06-01 01:37:47 4
 
< 0.1%
2020-12-17 20:36:39 4
 
< 0.1%
2020-12-13 17:53:47 4
 
< 0.1%
2020-12-19 16:02:22 4
 
< 0.1%
2020-06-16 21:07:32 3
 
< 0.1%
2020-03-29 15:19:49 3
 
< 0.1%
2020-12-06 13:06:24 3
 
< 0.1%
Other values (1819541) 1852357
> 99.9%

Length

2024-04-09T13:05:07.120334image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-11-30 6530
 
0.2%
2020-12-07 6506
 
0.2%
2019-12-08 6428
 
0.2%
2019-12-15 6425
 
0.2%
2020-12-14 6400
 
0.2%
2020-12-21 6390
 
0.2%
2019-12-22 6325
 
0.2%
2020-12-28 6321
 
0.2%
2019-12-29 6320
 
0.2%
2019-12-01 6283
 
0.2%
Other values (87120) 3640860
98.3%

Most occurring characters

ValueCountFrequency (%)
0 6745202
19.2%
2 5566647
15.8%
1 4622458
13.1%
- 3704788
10.5%
: 3704788
10.5%
1852394
 
5.3%
9 1767840
 
5.0%
3 1661287
 
4.7%
5 1467022
 
4.2%
4 1457257
 
4.1%
Other values (3) 2645803
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 25933516
73.7%
Dash Punctuation 3704788
 
10.5%
Other Punctuation 3704788
 
10.5%
Space Separator 1852394
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 6745202
26.0%
2 5566647
21.5%
1 4622458
17.8%
9 1767840
 
6.8%
3 1661287
 
6.4%
5 1467022
 
5.7%
4 1457257
 
5.6%
8 887465
 
3.4%
7 880319
 
3.4%
6 878019
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 3704788
100.0%
Other Punctuation
ValueCountFrequency (%)
: 3704788
100.0%
Space Separator
ValueCountFrequency (%)
1852394
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 35195486
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 6745202
19.2%
2 5566647
15.8%
1 4622458
13.1%
- 3704788
10.5%
: 3704788
10.5%
1852394
 
5.3%
9 1767840
 
5.0%
3 1661287
 
4.7%
5 1467022
 
4.2%
4 1457257
 
4.1%
Other values (3) 2645803
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35195486
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 6745202
19.2%
2 5566647
15.8%
1 4622458
13.1%
- 3704788
10.5%
: 3704788
10.5%
1852394
 
5.3%
9 1767840
 
5.0%
3 1661287
 
4.7%
5 1467022
 
4.2%
4 1457257
 
4.1%
Other values (3) 2645803
 
7.5%

cc_num
Real number (ℝ)

Distinct999
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1738604 × 1017
Minimum6.0416207 × 1010
Maximum4.9923464 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:07.229575image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum6.0416207 × 1010
5-th percentile6.3048488 × 1011
Q11.8004295 × 1014
median3.5214173 × 1015
Q34.6422555 × 1015
95-th percentile4.497914 × 1018
Maximum4.9923464 × 1018
Range4.9923463 × 1018
Interquartile range (IQR)4.4622125 × 1015

Descriptive statistics

Standard deviation1.3091153 × 1018
Coefficient of variation (CV)3.1364616
Kurtosis6.1753558
Mean4.1738604 × 1017
Median Absolute Deviation (MAD)3.0764709 × 1015
Skewness2.8510736
Sum5.0088429 × 1018
Variance1.7137828 × 1036
MonotonicityNot monotonic
2024-04-09T13:05:07.342658image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.538441737 × 10154392
 
0.2%
3.02704321 × 10134392
 
0.2%
6.538891243 × 10154386
 
0.2%
4.364010865 × 10154386
 
0.2%
4.642255475 × 10154386
 
0.2%
6.011438889 × 10154385
 
0.2%
3.447098678 × 10144385
 
0.2%
4.904681492 × 10154384
 
0.2%
4.586810169 × 10154384
 
0.2%
4.512828415 × 10184384
 
0.2%
Other values (989) 1808530
97.6%
ValueCountFrequency (%)
6.041620718 × 10102196
0.1%
6.042292873 × 10102200
0.1%
6.042309813 × 1010738
 
< 0.1%
6.042785159 × 1010743
 
< 0.1%
6.048700208 × 1010735
 
< 0.1%
6.04905963 × 10101465
0.1%
6.049559311 × 1010742
 
< 0.1%
5.018029536 × 10112194
0.1%
5.018181333 × 10118
 
< 0.1%
5.018282048 × 1011733
 
< 0.1%
ValueCountFrequency (%)
4.992346398 × 10182922
0.2%
4.989847571 × 10181471
0.1%
4.980323468 × 1018736
 
< 0.1%
4.973530368 × 10181467
0.1%
4.958589672 × 10182191
0.1%
4.95682899 × 10183657
0.2%
4.911818931 × 10189
 
< 0.1%
4.906628656 × 10183655
0.2%
4.897067971 × 10181471
0.1%
4.890424427 × 10182189
0.1%

merchant
Categorical

Distinct693
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size141.6 MiB
fraud_Kilback LLC
 
6262
fraud_Cormier LLC
 
5246
fraud_Schumm PLC
 
5195
fraud_Kuhn LLC
 
5031
fraud_Boyer PLC
 
4999
Other values (688)
1825661 

Length

Max length43
Median length36
Mean length23.130553
Min length13

Characters and Unicode

Total characters42846898
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfraud_Rippin, Kub and Mann
2nd rowfraud_Heller, Gutmann and Zieme
3rd rowfraud_Lind-Buckridge
4th rowfraud_Kutch, Hermiston and Farrell
5th rowfraud_Keeling-Crist

Common Values

ValueCountFrequency (%)
fraud_Kilback LLC 6262
 
0.3%
fraud_Cormier LLC 5246
 
0.3%
fraud_Schumm PLC 5195
 
0.3%
fraud_Kuhn LLC 5031
 
0.3%
fraud_Boyer PLC 4999
 
0.3%
fraud_Dickinson Ltd 4953
 
0.3%
fraud_Emard Inc 3867
 
0.2%
fraud_Cummerata-Jones 3860
 
0.2%
fraud_Corwin-Collins 3853
 
0.2%
fraud_Rodriguez Group 3843
 
0.2%
Other values (683) 1805285
97.5%

Length

2024-04-09T13:05:07.444790image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 677362
 
15.7%
llc 139662
 
3.2%
inc 131148
 
3.0%
sons 104651
 
2.4%
ltd 100896
 
2.3%
plc 94799
 
2.2%
group 72089
 
1.7%
fraud_kutch 15028
 
0.3%
fraud_schaefer 13367
 
0.3%
fraud_streich 13235
 
0.3%
Other values (804) 2956186
68.5%

Most occurring characters

ValueCountFrequency (%)
a 4158232
 
9.7%
r 3851348
 
9.0%
d 3055994
 
7.1%
e 2665745
 
6.2%
u 2654462
 
6.2%
n 2526397
 
5.9%
2466029
 
5.8%
f 1996096
 
4.7%
_ 1852394
 
4.3%
o 1614017
 
3.8%
Other values (45) 16006184
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32423363
75.7%
Uppercase Letter 4854354
 
11.3%
Space Separator 2466029
 
5.8%
Connector Punctuation 1852394
 
4.3%
Dash Punctuation 636438
 
1.5%
Other Punctuation 614320
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4158232
12.8%
r 3851348
11.9%
d 3055994
9.4%
e 2665745
 
8.2%
u 2654462
 
8.2%
n 2526397
 
7.8%
f 1996096
 
6.2%
o 1614017
 
5.0%
i 1542189
 
4.8%
t 1247340
 
3.8%
Other values (15) 7111543
21.9%
Uppercase Letter
ValueCountFrequency (%)
L 681216
14.0%
C 445751
 
9.2%
S 430704
 
8.9%
B 398258
 
8.2%
H 372874
 
7.7%
K 309458
 
6.4%
G 274723
 
5.7%
R 259594
 
5.3%
M 255426
 
5.3%
P 227921
 
4.7%
Other values (15) 1198429
24.7%
Other Punctuation
ValueCountFrequency (%)
, 572711
93.2%
' 41609
 
6.8%
Space Separator
ValueCountFrequency (%)
2466029
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1852394
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 636438
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37277717
87.0%
Common 5569181
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4158232
 
11.2%
r 3851348
 
10.3%
d 3055994
 
8.2%
e 2665745
 
7.2%
u 2654462
 
7.1%
n 2526397
 
6.8%
f 1996096
 
5.4%
o 1614017
 
4.3%
i 1542189
 
4.1%
t 1247340
 
3.3%
Other values (40) 11965897
32.1%
Common
ValueCountFrequency (%)
2466029
44.3%
_ 1852394
33.3%
- 636438
 
11.4%
, 572711
 
10.3%
' 41609
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42846898
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4158232
 
9.7%
r 3851348
 
9.0%
d 3055994
 
7.1%
e 2665745
 
6.2%
u 2654462
 
6.2%
n 2526397
 
5.9%
2466029
 
5.8%
f 1996096
 
4.7%
_ 1852394
 
4.3%
o 1614017
 
3.8%
Other values (45) 16006184
37.4%

category
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size119.3 MiB
gas_transport
188029 
grocery_pos
176191 
home
175460 
shopping_pos
166463 
kids_pets
161727 
Other values (9)
984524 

Length

Max length14
Median length12
Mean length10.525913
Min length4

Characters and Unicode

Total characters19498139
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmisc_net
2nd rowgrocery_pos
3rd rowentertainment
4th rowgas_transport
5th rowmisc_pos

Common Values

ValueCountFrequency (%)
gas_transport 188029
10.2%
grocery_pos 176191
9.5%
home 175460
9.5%
shopping_pos 166463
9.0%
kids_pets 161727
8.7%
shopping_net 139322
7.5%
entertainment 134118
7.2%
food_dining 130729
 
7.1%
personal_care 130085
 
7.0%
health_fitness 122553
 
6.6%
Other values (4) 327717
17.7%

Length

2024-04-09T13:05:07.535351image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gas_transport 188029
10.2%
grocery_pos 176191
9.5%
home 175460
9.5%
shopping_pos 166463
9.0%
kids_pets 161727
8.7%
shopping_net 139322
7.5%
entertainment 134118
7.2%
food_dining 130729
 
7.1%
personal_care 130085
 
7.0%
health_fitness 122553
 
6.6%
Other values (4) 327717
17.7%

Most occurring characters

ValueCountFrequency (%)
s 2042254
10.5%
e 1838696
9.4%
o 1758769
9.0%
n 1705118
8.7%
p 1548294
 
7.9%
t 1538055
 
7.9%
_ 1484860
 
7.6%
r 1310440
 
6.7%
i 1190524
 
6.1%
a 950855
 
4.9%
Other values (10) 4130274
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18013279
92.4%
Connector Punctuation 1484860
 
7.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 2042254
11.3%
e 1838696
10.2%
o 1758769
9.8%
n 1705118
9.5%
p 1548294
8.6%
t 1538055
8.5%
r 1310440
7.3%
i 1190524
 
6.6%
a 950855
 
5.3%
g 865612
 
4.8%
Other values (9) 3264662
18.1%
Connector Punctuation
ValueCountFrequency (%)
_ 1484860
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18013279
92.4%
Common 1484860
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 2042254
11.3%
e 1838696
10.2%
o 1758769
9.8%
n 1705118
9.5%
p 1548294
8.6%
t 1538055
8.5%
r 1310440
7.3%
i 1190524
 
6.6%
a 950855
 
5.3%
g 865612
 
4.8%
Other values (9) 3264662
18.1%
Common
ValueCountFrequency (%)
_ 1484860
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19498139
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 2042254
10.5%
e 1838696
9.4%
o 1758769
9.0%
n 1705118
8.7%
p 1548294
 
7.9%
t 1538055
 
7.9%
_ 1484860
 
7.6%
r 1310440
 
6.7%
i 1190524
 
6.1%
a 950855
 
4.9%
Other values (10) 4130274
21.2%

amt
Real number (ℝ)

Distinct60616
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.063567
Minimum1
Maximum28948.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:07.629162image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.44
Q19.64
median47.45
Q383.1
95-th percentile195.34
Maximum28948.9
Range28947.9
Interquartile range (IQR)73.46

Descriptive statistics

Standard deviation159.25397
Coefficient of variation (CV)2.2729927
Kurtosis4181.9073
Mean70.063567
Median Absolute Deviation (MAD)37.46
Skewness40.812809
Sum1.2978533 × 108
Variance25361.828
MonotonicityNot monotonic
2024-04-09T13:05:08.660679image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.14 779
 
< 0.1%
1.1 745
 
< 0.1%
1.04 744
 
< 0.1%
1.08 741
 
< 0.1%
1.2 737
 
< 0.1%
1.25 737
 
< 0.1%
1.02 736
 
< 0.1%
1.01 735
 
< 0.1%
1.22 727
 
< 0.1%
1.03 726
 
< 0.1%
Other values (60606) 1844987
99.6%
ValueCountFrequency (%)
1 332
< 0.1%
1.01 735
< 0.1%
1.02 736
< 0.1%
1.03 726
< 0.1%
1.04 744
< 0.1%
1.05 721
< 0.1%
1.06 671
< 0.1%
1.07 723
< 0.1%
1.08 741
< 0.1%
1.09 720
< 0.1%
ValueCountFrequency (%)
28948.9 1
< 0.1%
27390.12 1
< 0.1%
27119.77 1
< 0.1%
26544.12 1
< 0.1%
25086.94 1
< 0.1%
22768.11 1
< 0.1%
21437.71 1
< 0.1%
19364.91 1
< 0.1%
17897.24 1
< 0.1%
16837.08 1
< 0.1%

first
Categorical

Distinct355
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.4 MiB
Christopher
 
38112
Robert
 
30743
Jessica
 
29236
David
 
28564
Michael
 
28539
Other values (350)
1697200 

Length

Max length11
Median length9
Mean length6.0802977
Min length3

Characters and Unicode

Total characters11263107
Distinct characters49
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJennifer
2nd rowStephanie
3rd rowEdward
4th rowJeremy
5th rowTyler

Common Values

ValueCountFrequency (%)
Christopher 38112
 
2.1%
Robert 30743
 
1.7%
Jessica 29236
 
1.6%
David 28564
 
1.5%
Michael 28539
 
1.5%
James 28496
 
1.5%
Jennifer 24181
 
1.3%
John 23445
 
1.3%
Mary 23424
 
1.3%
William 23396
 
1.3%
Other values (345) 1574258
85.0%

Length

2024-04-09T13:05:08.755291image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
christopher 38112
 
2.1%
robert 30743
 
1.7%
jessica 29236
 
1.6%
david 28564
 
1.5%
michael 28539
 
1.5%
james 28496
 
1.5%
jennifer 24181
 
1.3%
john 23445
 
1.3%
mary 23424
 
1.3%
william 23396
 
1.3%
Other values (345) 1574258
85.0%

Most occurring characters

ValueCountFrequency (%)
a 1438618
 
12.8%
e 1230164
 
10.9%
i 883628
 
7.8%
n 877668
 
7.8%
r 867952
 
7.7%
l 554750
 
4.9%
h 493347
 
4.4%
s 463151
 
4.1%
t 444904
 
4.0%
o 384330
 
3.4%
Other values (39) 3624595
32.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9410713
83.6%
Uppercase Letter 1852394
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1438618
15.3%
e 1230164
13.1%
i 883628
9.4%
n 877668
9.3%
r 867952
9.2%
l 554750
 
5.9%
h 493347
 
5.2%
s 463151
 
4.9%
t 444904
 
4.7%
o 384330
 
4.1%
Other values (16) 1772201
18.8%
Uppercase Letter
ValueCountFrequency (%)
J 312497
16.9%
M 207053
11.2%
S 163822
8.8%
A 161026
8.7%
C 151594
8.2%
D 123025
 
6.6%
K 122133
 
6.6%
R 100327
 
5.4%
T 95095
 
5.1%
L 89957
 
4.9%
Other values (13) 325865
17.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 11263107
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1438618
 
12.8%
e 1230164
 
10.9%
i 883628
 
7.8%
n 877668
 
7.8%
r 867952
 
7.7%
l 554750
 
4.9%
h 493347
 
4.4%
s 463151
 
4.1%
t 444904
 
4.0%
o 384330
 
3.4%
Other values (39) 3624595
32.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11263107
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1438618
 
12.8%
e 1230164
 
10.9%
i 883628
 
7.8%
n 877668
 
7.8%
r 867952
 
7.7%
l 554750
 
4.9%
h 493347
 
4.4%
s 463151
 
4.1%
t 444904
 
4.0%
o 384330
 
3.4%
Other values (39) 3624595
32.2%

last
Categorical

Distinct486
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.5 MiB
Smith
 
40940
Williams
 
33661
Davis
 
31434
Johnson
 
28590
Rodriguez
 
24879
Other values (481)
1692890 

Length

Max length11
Median length10
Mean length6.1123751
Min length2

Characters and Unicode

Total characters11322527
Distinct characters48
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBanks
2nd rowGill
3rd rowSanchez
4th rowWhite
5th rowGarcia

Common Values

ValueCountFrequency (%)
Smith 40940
 
2.2%
Williams 33661
 
1.8%
Davis 31434
 
1.7%
Johnson 28590
 
1.5%
Rodriguez 24879
 
1.3%
Martinez 21246
 
1.1%
Jones 19825
 
1.1%
Lewis 18293
 
1.0%
Miller 16821
 
0.9%
Gonzalez 16809
 
0.9%
Other values (476) 1599896
86.4%

Length

2024-04-09T13:05:08.846861image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
smith 40940
 
2.2%
williams 33661
 
1.8%
davis 31434
 
1.7%
johnson 28590
 
1.5%
rodriguez 24879
 
1.3%
martinez 21246
 
1.1%
jones 19825
 
1.1%
lewis 18293
 
1.0%
miller 16821
 
0.9%
gonzalez 16809
 
0.9%
Other values (476) 1599896
86.4%

Most occurring characters

ValueCountFrequency (%)
e 1122673
 
9.9%
r 941641
 
8.3%
a 926704
 
8.2%
n 869662
 
7.7%
o 832319
 
7.4%
l 698286
 
6.2%
s 696904
 
6.2%
i 622878
 
5.5%
t 412730
 
3.6%
h 327959
 
2.9%
Other values (38) 3870771
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9470133
83.6%
Uppercase Letter 1852394
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1122673
11.9%
r 941641
9.9%
a 926704
9.8%
n 869662
9.2%
o 832319
8.8%
l 698286
 
7.4%
s 696904
 
7.4%
i 622878
 
6.6%
t 412730
 
4.4%
h 327959
 
3.5%
Other values (15) 2018377
21.3%
Uppercase Letter
ValueCountFrequency (%)
M 226754
12.2%
W 152268
 
8.2%
S 150041
 
8.1%
C 133108
 
7.2%
B 120068
 
6.5%
R 118650
 
6.4%
H 116343
 
6.3%
G 108329
 
5.8%
J 102555
 
5.5%
P 94463
 
5.1%
Other values (13) 529815
28.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 11322527
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1122673
 
9.9%
r 941641
 
8.3%
a 926704
 
8.2%
n 869662
 
7.7%
o 832319
 
7.4%
l 698286
 
6.2%
s 696904
 
6.2%
i 622878
 
5.5%
t 412730
 
3.6%
h 327959
 
2.9%
Other values (38) 3870771
34.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11322527
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1122673
 
9.9%
r 941641
 
8.3%
a 926704
 
8.2%
n 869662
 
7.7%
o 832319
 
7.4%
l 698286
 
6.2%
s 696904
 
6.2%
i 622878
 
5.5%
t 412730
 
3.6%
h 327959
 
2.9%
Other values (38) 3870771
34.2%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size102.5 MiB
F
1014749 
M
837645 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1852394
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F 1014749
54.8%
M 837645
45.2%

Length

2024-04-09T13:05:08.923035image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-09T13:05:09.010356image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
f 1014749
54.8%
m 837645
45.2%

Most occurring characters

ValueCountFrequency (%)
F 1014749
54.8%
M 837645
45.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1852394
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 1014749
54.8%
M 837645
45.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 1852394
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 1014749
54.8%
M 837645
45.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1852394
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 1014749
54.8%
M 837645
45.2%

street
Categorical

Distinct999
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size140.0 MiB
444 Robert Mews
 
4392
908 Brooks Brook
 
4392
5796 Lee Coves Apt. 286
 
4386
03512 Jackson Ports
 
4386
320 Nicholson Orchard
 
4386
Other values (994)
1830452 

Length

Max length35
Median length29
Mean length22.231289
Min length12

Characters and Unicode

Total characters41181107
Distinct characters62
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row561 Perry Cove
2nd row43039 Riley Greens Suite 393
3rd row594 White Dale Suite 530
4th row9443 Cynthia Court Apt. 038
5th row408 Bradley Rest

Common Values

ValueCountFrequency (%)
444 Robert Mews 4392
 
0.2%
908 Brooks Brook 4392
 
0.2%
5796 Lee Coves Apt. 286 4386
 
0.2%
03512 Jackson Ports 4386
 
0.2%
320 Nicholson Orchard 4386
 
0.2%
40624 Rebecca Spurs 4385
 
0.2%
2924 Bobby Trafficway 4385
 
0.2%
574 David Locks Suite 207 4384
 
0.2%
6983 Carrillo Isle 4384
 
0.2%
864 Reynolds Plains 4384
 
0.2%
Other values (989) 1808530
97.6%

Length

2024-04-09T13:05:09.097925image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
apt 468297
 
6.4%
suite 437016
 
5.9%
island 32903
 
0.4%
michael 27058
 
0.4%
islands 25611
 
0.3%
station 25602
 
0.3%
common 25585
 
0.3%
david 24853
 
0.3%
brooks 24143
 
0.3%
fields 23400
 
0.3%
Other values (1959) 6253340
84.9%

Most occurring characters

ValueCountFrequency (%)
5515414
 
13.4%
e 2561201
 
6.2%
a 2077034
 
5.0%
i 1851621
 
4.5%
t 1782137
 
4.3%
r 1576757
 
3.8%
n 1523518
 
3.7%
s 1476954
 
3.6%
l 1270600
 
3.1%
o 1251043
 
3.0%
Other values (52) 20294828
49.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20590784
50.0%
Decimal Number 9996511
24.3%
Space Separator 5515414
 
13.4%
Uppercase Letter 4610101
 
11.2%
Other Punctuation 468297
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2561201
12.4%
a 2077034
10.1%
i 1851621
9.0%
t 1782137
8.7%
r 1576757
 
7.7%
n 1523518
 
7.4%
s 1476954
 
7.2%
l 1270600
 
6.2%
o 1251043
 
6.1%
u 877146
 
4.3%
Other values (16) 4342773
21.1%
Uppercase Letter
ValueCountFrequency (%)
S 802997
17.4%
A 602938
13.1%
M 368232
 
8.0%
C 319120
 
6.9%
P 279598
 
6.1%
R 266324
 
5.8%
B 212130
 
4.6%
F 204980
 
4.4%
L 188044
 
4.1%
J 173487
 
3.8%
Other values (14) 1192251
25.9%
Decimal Number
ValueCountFrequency (%)
5 1069201
10.7%
3 1058055
10.6%
2 1049445
10.5%
7 1003976
10.0%
1 990872
9.9%
8 988888
9.9%
6 968316
9.7%
0 967826
9.7%
4 958817
9.6%
9 941115
9.4%
Space Separator
ValueCountFrequency (%)
5515414
100.0%
Other Punctuation
ValueCountFrequency (%)
. 468297
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25200885
61.2%
Common 15980222
38.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2561201
 
10.2%
a 2077034
 
8.2%
i 1851621
 
7.3%
t 1782137
 
7.1%
r 1576757
 
6.3%
n 1523518
 
6.0%
s 1476954
 
5.9%
l 1270600
 
5.0%
o 1251043
 
5.0%
u 877146
 
3.5%
Other values (40) 8952874
35.5%
Common
ValueCountFrequency (%)
5515414
34.5%
5 1069201
 
6.7%
3 1058055
 
6.6%
2 1049445
 
6.6%
7 1003976
 
6.3%
1 990872
 
6.2%
8 988888
 
6.2%
6 968316
 
6.1%
0 967826
 
6.1%
4 958817
 
6.0%
Other values (2) 1409412
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41181107
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5515414
 
13.4%
e 2561201
 
6.2%
a 2077034
 
5.0%
i 1851621
 
4.5%
t 1782137
 
4.3%
r 1576757
 
3.8%
n 1523518
 
3.7%
s 1476954
 
3.6%
l 1270600
 
3.1%
o 1251043
 
3.0%
Other values (52) 20294828
49.3%

city
Categorical

Distinct906
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size116.0 MiB
Birmingham
 
8040
San Antonio
 
7312
Utica
 
7309
Phoenix
 
7297
Meridian
 
7289
Other values (901)
1815147 

Length

Max length25
Median length21
Mean length8.6526209
Min length3

Characters and Unicode

Total characters16028063
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMoravian Falls
2nd rowOrient
3rd rowMalad City
4th rowBoulder
5th rowDoe Hill

Common Values

ValueCountFrequency (%)
Birmingham 8040
 
0.4%
San Antonio 7312
 
0.4%
Utica 7309
 
0.4%
Phoenix 7297
 
0.4%
Meridian 7289
 
0.4%
Warren 6584
 
0.4%
Conway 6574
 
0.4%
Cleveland 6572
 
0.4%
Thomas 6571
 
0.4%
Houston 5865
 
0.3%
Other values (896) 1782981
96.3%

Length

2024-04-09T13:05:09.193415image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city 30780
 
1.3%
west 27847
 
1.2%
saint 20483
 
0.9%
north 20472
 
0.9%
falls 18286
 
0.8%
new 16857
 
0.7%
mount 16098
 
0.7%
lake 16089
 
0.7%
san 14638
 
0.6%
springs 12414
 
0.5%
Other values (929) 2118136
91.6%

Most occurring characters

ValueCountFrequency (%)
e 1555978
 
9.7%
a 1334959
 
8.3%
n 1173952
 
7.3%
o 1168590
 
7.3%
l 1115539
 
7.0%
r 1070587
 
6.7%
i 1007053
 
6.3%
t 855511
 
5.3%
s 637587
 
4.0%
459706
 
2.9%
Other values (42) 5648601
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13253329
82.7%
Uppercase Letter 2313564
 
14.4%
Space Separator 459706
 
2.9%
Dash Punctuation 1464
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1555978
11.7%
a 1334959
10.1%
n 1173952
8.9%
o 1168590
8.8%
l 1115539
 
8.4%
r 1070587
 
8.1%
i 1007053
 
7.6%
t 855511
 
6.5%
s 637587
 
4.8%
d 441997
 
3.3%
Other values (15) 2891576
21.8%
Uppercase Letter
ValueCountFrequency (%)
C 224081
 
9.7%
M 211444
 
9.1%
S 193942
 
8.4%
B 190231
 
8.2%
H 165364
 
7.1%
W 136180
 
5.9%
P 131749
 
5.7%
L 123697
 
5.3%
R 113410
 
4.9%
A 106896
 
4.6%
Other values (15) 716570
31.0%
Space Separator
ValueCountFrequency (%)
459706
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1464
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 15566893
97.1%
Common 461170
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1555978
 
10.0%
a 1334959
 
8.6%
n 1173952
 
7.5%
o 1168590
 
7.5%
l 1115539
 
7.2%
r 1070587
 
6.9%
i 1007053
 
6.5%
t 855511
 
5.5%
s 637587
 
4.1%
d 441997
 
2.8%
Other values (40) 5205140
33.4%
Common
ValueCountFrequency (%)
459706
99.7%
- 1464
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16028063
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1555978
 
9.7%
a 1334959
 
8.3%
n 1173952
 
7.3%
o 1168590
 
7.3%
l 1115539
 
7.0%
r 1070587
 
6.7%
i 1007053
 
6.3%
t 855511
 
5.3%
s 637587
 
4.0%
459706
 
2.9%
Other values (42) 5648601
35.2%

state
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.2 MiB
TX
135269 
NY
 
119419
PA
 
114173
CA
 
80495
OH
 
66627
Other values (46)
1336411 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters3704788
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNC
2nd rowWA
3rd rowID
4th rowMT
5th rowVA

Common Values

ValueCountFrequency (%)
TX 135269
 
7.3%
NY 119419
 
6.4%
PA 114173
 
6.2%
CA 80495
 
4.3%
OH 66627
 
3.6%
MI 65825
 
3.6%
IL 62212
 
3.4%
FL 60775
 
3.3%
AL 58521
 
3.2%
MO 54904
 
3.0%
Other values (41) 1034174
55.8%

Length

2024-04-09T13:05:09.278338image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tx 135269
 
7.3%
ny 119419
 
6.4%
pa 114173
 
6.2%
ca 80495
 
4.3%
oh 66627
 
3.6%
mi 65825
 
3.6%
il 62212
 
3.4%
fl 60775
 
3.3%
al 58521
 
3.2%
mo 54904
 
3.0%
Other values (41) 1034174
55.8%

Most occurring characters

ValueCountFrequency (%)
A 508580
13.7%
N 406389
 
11.0%
M 314756
 
8.5%
I 260547
 
7.0%
T 220136
 
5.9%
L 211461
 
5.7%
O 205755
 
5.6%
C 201235
 
5.4%
Y 188176
 
5.1%
X 135269
 
3.7%
Other values (14) 1052484
28.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3704788
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 508580
13.7%
N 406389
 
11.0%
M 314756
 
8.5%
I 260547
 
7.0%
T 220136
 
5.9%
L 211461
 
5.7%
O 205755
 
5.6%
C 201235
 
5.4%
Y 188176
 
5.1%
X 135269
 
3.7%
Other values (14) 1052484
28.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 3704788
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 508580
13.7%
N 406389
 
11.0%
M 314756
 
8.5%
I 260547
 
7.0%
T 220136
 
5.9%
L 211461
 
5.7%
O 205755
 
5.6%
C 201235
 
5.4%
Y 188176
 
5.1%
X 135269
 
3.7%
Other values (14) 1052484
28.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3704788
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 508580
13.7%
N 406389
 
11.0%
M 314756
 
8.5%
I 260547
 
7.0%
T 220136
 
5.9%
L 211461
 
5.7%
O 205755
 
5.6%
C 201235
 
5.4%
Y 188176
 
5.1%
X 135269
 
3.7%
Other values (14) 1052484
28.4%

zip
Real number (ℝ)

Distinct985
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48813.258
Minimum1257
Maximum99921
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:09.362189image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1257
5-th percentile7208
Q126237
median48174
Q372042
95-th percentile94569
Maximum99921
Range98664
Interquartile range (IQR)45805

Descriptive statistics

Standard deviation26881.846
Coefficient of variation (CV)0.55070788
Kurtosis-1.0960542
Mean48813.258
Median Absolute Deviation (MAD)23068
Skewness0.078949647
Sum9.0421387 × 1010
Variance7.2263364 × 108
MonotonicityNot monotonic
2024-04-09T13:05:09.463877image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
82514 5116
 
0.3%
73754 5116
 
0.3%
48088 5115
 
0.3%
34112 5108
 
0.3%
61454 4392
 
0.2%
16114 4392
 
0.2%
89512 4386
 
0.2%
72476 4386
 
0.2%
84540 4386
 
0.2%
72042 4385
 
0.2%
Other values (975) 1805612
97.5%
ValueCountFrequency (%)
1257 2923
0.2%
1330 1466
0.1%
1535 734
 
< 0.1%
1545 1468
0.1%
1612 738
 
< 0.1%
1843 3652
0.2%
1844 2919
0.2%
2180 738
 
< 0.1%
2630 2924
0.2%
2908 745
 
< 0.1%
ValueCountFrequency (%)
99921 14
 
< 0.1%
99783 2203
0.1%
99747 12
 
< 0.1%
99746 734
 
< 0.1%
99323 3651
0.2%
99160 4362
0.2%
99116 15
 
< 0.1%
99113 1463
 
0.1%
99033 3646
0.2%
98836 740
 
< 0.1%

lat
Real number (ℝ)

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.539311
Minimum20.0271
Maximum66.6933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:09.563961image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum20.0271
5-th percentile29.8826
Q134.6689
median39.3543
Q341.9404
95-th percentile45.8433
Maximum66.6933
Range46.6662
Interquartile range (IQR)7.2715

Descriptive statistics

Standard deviation5.0714704
Coefficient of variation (CV)0.13159214
Kurtosis0.79107707
Mean38.539311
Median Absolute Deviation (MAD)3.3597
Skewness-0.19199899
Sum71389988
Variance25.719812
MonotonicityNot monotonic
2024-04-09T13:05:09.661983image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36.385 5116
 
0.3%
43.0048 5116
 
0.3%
42.5164 5115
 
0.3%
26.1184 5108
 
0.3%
40.6761 4392
 
0.2%
41.3851 4392
 
0.2%
39.5483 4386
 
0.2%
38.9999 4386
 
0.2%
36.0244 4386
 
0.2%
34.2853 4385
 
0.2%
Other values (973) 1805612
97.5%
ValueCountFrequency (%)
20.0271 2186
0.1%
20.0827 1463
 
0.1%
24.6557 3655
0.2%
26.1184 5108
0.3%
26.3304 741
 
< 0.1%
26.3771 732
 
< 0.1%
26.4215 4362
0.2%
26.4722 3650
0.2%
26.529 2202
0.1%
26.6939 1467
 
0.1%
ValueCountFrequency (%)
66.6933 12
 
< 0.1%
65.6899 734
 
< 0.1%
64.7556 2203
0.1%
55.4732 14
 
< 0.1%
48.8878 4362
0.2%
48.8856 2909
0.2%
48.8328 2200
0.1%
48.6669 1469
 
0.1%
48.6031 4376
0.2%
48.4786 2916
0.2%

long
Real number (ℝ)

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.227832
Minimum-165.6723
Maximum-67.9503
Zeros0
Zeros (%)0.0%
Negative1852394
Negative (%)100.0%
Memory size14.1 MiB
2024-04-09T13:05:09.755703image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum-165.6723
5-th percentile-119.0825
Q1-96.798
median-87.4769
Q3-80.158
95-th percentile-73.5365
Maximum-67.9503
Range97.722
Interquartile range (IQR)16.64

Descriptive statistics

Standard deviation13.747895
Coefficient of variation (CV)-0.15236867
Kurtosis1.8375586
Mean-90.227832
Median Absolute Deviation (MAD)8.1527
Skewness-1.1469188
Sum-1.671375 × 108
Variance189.00461
MonotonicityNot monotonic
2024-04-09T13:05:09.861438image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-98.0727 5116
 
0.3%
-108.8964 5116
 
0.3%
-82.9832 5115
 
0.3%
-81.7361 5108
 
0.3%
-91.0391 4392
 
0.2%
-80.1752 4392
 
0.2%
-82.7243 4391
 
0.2%
-109.615 4386
 
0.2%
-90.9288 4386
 
0.2%
-119.7957 4386
 
0.2%
Other values (973) 1805606
97.5%
ValueCountFrequency (%)
-165.6723 2203
0.1%
-156.292 734
 
< 0.1%
-155.488 1463
0.1%
-155.3697 2186
0.1%
-153.994 12
 
< 0.1%
-133.1171 14
 
< 0.1%
-124.4409 1467
0.1%
-124.2174 2195
0.1%
-124.1587 1465
0.1%
-124.1437 2198
0.1%
ValueCountFrequency (%)
-67.9503 2922
0.2%
-68.5565 1467
 
0.1%
-69.2675 743
 
< 0.1%
-69.4828 2931
0.2%
-69.9576 737
 
< 0.1%
-69.9656 4374
0.2%
-70.1031 9
 
< 0.1%
-70.239 1455
 
0.1%
-70.3001 2924
0.2%
-70.3457 2196
0.1%

city_pop
Real number (ℝ)

Distinct891
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88643.675
Minimum23
Maximum2906700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:09.965633image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile139
Q1741
median2443
Q320328
95-th percentile525713
Maximum2906700
Range2906677
Interquartile range (IQR)19587

Descriptive statistics

Standard deviation301487.62
Coefficient of variation (CV)3.4011182
Kurtosis37.572846
Mean88643.675
Median Absolute Deviation (MAD)2188
Skewness5.5908046
Sum1.6420301 × 1011
Variance9.0894784 × 1010
MonotonicityNot monotonic
2024-04-09T13:05:10.063825image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
606 8049
 
0.4%
1595797 7312
 
0.4%
1312922 7297
 
0.4%
241 6578
 
0.4%
1766 6556
 
0.4%
2906700 5865
 
0.3%
302 5853
 
0.3%
198 5850
 
0.3%
276002 5849
 
0.3%
1126 5841
 
0.3%
Other values (881) 1787344
96.5%
ValueCountFrequency (%)
23 2915
0.2%
37 1469
 
0.1%
43 2920
0.2%
46 4386
0.2%
47 734
 
< 0.1%
49 1472
 
0.1%
51 1470
 
0.1%
52 740
 
< 0.1%
53 3660
0.2%
60 1472
 
0.1%
ValueCountFrequency (%)
2906700 5865
0.3%
2504700 2929
0.2%
2383912 737
 
< 0.1%
1595797 7312
0.4%
1577385 3680
0.2%
1526206 5113
0.3%
1417793 8
 
< 0.1%
1382480 2913
 
0.2%
1312922 7297
0.4%
1263321 5141
0.3%

job
Categorical

Distinct497
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size136.4 MiB
Film/video editor
 
13898
Exhibition designer
 
13167
Surveyor, land/geomatics
 
12436
Naval architect
 
12434
Materials engineer
 
11711
Other values (492)
1788748 

Length

Max length59
Median length38
Mean length20.232398
Min length3

Characters and Unicode

Total characters37478372
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPsychologist, counselling
2nd rowSpecial educational needs teacher
3rd rowNature conservation officer
4th rowPatent attorney
5th rowDance movement psychotherapist

Common Values

ValueCountFrequency (%)
Film/video editor 13898
 
0.8%
Exhibition designer 13167
 
0.7%
Surveyor, land/geomatics 12436
 
0.7%
Naval architect 12434
 
0.7%
Materials engineer 11711
 
0.6%
Designer, ceramics/pottery 11688
 
0.6%
Environmental consultant 10974
 
0.6%
Financial adviser 10963
 
0.6%
Systems developer 10962
 
0.6%
IT trainer 10943
 
0.6%
Other values (487) 1733218
93.6%

Length

2024-04-09T13:05:10.176418image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer 188048
 
4.6%
officer 158202
 
3.8%
manager 87837
 
2.1%
scientist 79740
 
1.9%
designer 74639
 
1.8%
surveyor 70288
 
1.7%
teacher 54865
 
1.3%
psychologist 46856
 
1.1%
research 42426
 
1.0%
editor 40958
 
1.0%
Other values (457) 3270295
79.5%

Most occurring characters

ValueCountFrequency (%)
e 4003951
 
10.7%
i 3407729
 
9.1%
r 3140909
 
8.4%
a 2593110
 
6.9%
t 2547852
 
6.8%
n 2521475
 
6.7%
2261760
 
6.0%
o 2133314
 
5.7%
s 2064644
 
5.5%
c 1890653
 
5.0%
Other values (43) 10912975
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32557679
86.9%
Space Separator 2261760
 
6.0%
Uppercase Letter 1956330
 
5.2%
Other Punctuation 633855
 
1.7%
Close Punctuation 34374
 
0.1%
Open Punctuation 34374
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 4003951
12.3%
i 3407729
10.5%
r 3140909
9.6%
a 2593110
 
8.0%
t 2547852
 
7.8%
n 2521475
 
7.7%
o 2133314
 
6.6%
s 2064644
 
6.3%
c 1890653
 
5.8%
l 1428836
 
4.4%
Other values (16) 6825206
21.0%
Uppercase Letter
ValueCountFrequency (%)
C 224603
11.5%
E 207826
10.6%
P 204864
10.5%
S 196162
10.0%
T 161692
 
8.3%
M 127340
 
6.5%
A 126006
 
6.4%
F 98100
 
5.0%
D 82672
 
4.2%
R 79744
 
4.1%
Other values (11) 447321
22.9%
Other Punctuation
ValueCountFrequency (%)
, 446459
70.4%
/ 176367
 
27.8%
' 11029
 
1.7%
Space Separator
ValueCountFrequency (%)
2261760
100.0%
Close Punctuation
ValueCountFrequency (%)
) 34374
100.0%
Open Punctuation
ValueCountFrequency (%)
( 34374
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34514009
92.1%
Common 2964363
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 4003951
11.6%
i 3407729
 
9.9%
r 3140909
 
9.1%
a 2593110
 
7.5%
t 2547852
 
7.4%
n 2521475
 
7.3%
o 2133314
 
6.2%
s 2064644
 
6.0%
c 1890653
 
5.5%
l 1428836
 
4.1%
Other values (37) 8781536
25.4%
Common
ValueCountFrequency (%)
2261760
76.3%
, 446459
 
15.1%
/ 176367
 
5.9%
) 34374
 
1.2%
( 34374
 
1.2%
' 11029
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37478372
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 4003951
 
10.7%
i 3407729
 
9.1%
r 3140909
 
8.4%
a 2593110
 
6.9%
t 2547852
 
6.8%
n 2521475
 
6.7%
2261760
 
6.0%
o 2133314
 
5.7%
s 2064644
 
5.5%
c 1890653
 
5.0%
Other values (43) 10912975
29.1%

dob
Categorical

Distinct984
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size118.4 MiB
1977-03-23
 
8044
1988-09-15
 
6574
1981-08-29
 
6571
1955-05-06
 
5121
1960-01-13
 
4395
Other values (979)
1821689 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters18523940
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1988-03-09
2nd row1978-06-21
3rd row1962-01-19
4th row1967-01-12
5th row1986-03-28

Common Values

ValueCountFrequency (%)
1977-03-23 8044
 
0.4%
1988-09-15 6574
 
0.4%
1981-08-29 6571
 
0.4%
1955-05-06 5121
 
0.3%
1960-01-13 4395
 
0.2%
1972-11-28 4392
 
0.2%
1997-09-22 4392
 
0.2%
1997-03-12 4386
 
0.2%
1987-04-23 4386
 
0.2%
1993-04-08 4385
 
0.2%
Other values (974) 1799748
97.2%

Length

2024-04-09T13:05:10.271611image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1977-03-23 8044
 
0.4%
1988-09-15 6574
 
0.4%
1981-08-29 6571
 
0.4%
1955-05-06 5121
 
0.3%
1960-01-13 4395
 
0.2%
1997-09-22 4392
 
0.2%
1972-11-28 4392
 
0.2%
1997-03-12 4386
 
0.2%
1987-04-23 4386
 
0.2%
1993-04-08 4385
 
0.2%
Other values (974) 1799748
97.2%

Most occurring characters

ValueCountFrequency (%)
- 3704788
20.0%
1 3546230
19.1%
9 2637517
14.2%
0 2559962
13.8%
2 1291776
 
7.0%
7 950290
 
5.1%
8 921815
 
5.0%
6 782368
 
4.2%
5 765535
 
4.1%
3 691719
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 14819152
80.0%
Dash Punctuation 3704788
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 3546230
23.9%
9 2637517
17.8%
0 2559962
17.3%
2 1291776
 
8.7%
7 950290
 
6.4%
8 921815
 
6.2%
6 782368
 
5.3%
5 765535
 
5.2%
3 691719
 
4.7%
4 671940
 
4.5%
Dash Punctuation
ValueCountFrequency (%)
- 3704788
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 18523940
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 3704788
20.0%
1 3546230
19.1%
9 2637517
14.2%
0 2559962
13.8%
2 1291776
 
7.0%
7 950290
 
5.1%
8 921815
 
5.0%
6 782368
 
4.2%
5 765535
 
4.1%
3 691719
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18523940
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 3704788
20.0%
1 3546230
19.1%
9 2637517
14.2%
0 2559962
13.8%
2 1291776
 
7.0%
7 950290
 
5.1%
8 921815
 
5.0%
6 782368
 
4.2%
5 765535
 
4.1%
3 691719
 
3.7%

trans_num
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct1852394
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size157.2 MiB
0b242abb623afc578575680df30655b9
 
1
ad44105b2defec28687e0c0c00db0dd2
 
1
fff81009b323191d0a427a2af21e5bc7
 
1
2341854d7594722011878c08ba3819dd
 
1
f7f00d638b16a8b1045d8a8978b607bd
 
1
Other values (1852389)
1852389 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters59276608
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1852394 ?
Unique (%)100.0%

Sample

1st row0b242abb623afc578575680df30655b9
2nd row1f76529f8574734946361c461b024d99
3rd rowa1a22d70485983eac12b5b88dad1cf95
4th row6b849c168bdad6f867558c3793159a81
5th rowa41d7549acf90789359a9aa5346dcb46

Common Values

ValueCountFrequency (%)
0b242abb623afc578575680df30655b9 1
 
< 0.1%
ad44105b2defec28687e0c0c00db0dd2 1
 
< 0.1%
fff81009b323191d0a427a2af21e5bc7 1
 
< 0.1%
2341854d7594722011878c08ba3819dd 1
 
< 0.1%
f7f00d638b16a8b1045d8a8978b607bd 1
 
< 0.1%
fd51c30194494698905af5871aa6cc15 1
 
< 0.1%
67ac45826276ec940cfbc0de6a19c501 1
 
< 0.1%
8c95f2a0927c7c65bfa4dbb18d31d36b 1
 
< 0.1%
f44fb23e29d805a76b54dcdfb0e4e755 1
 
< 0.1%
9589fd1bdbed3034eea19e0cdeefdc15 1
 
< 0.1%
Other values (1852384) 1852384
> 99.9%

Length

2024-04-09T13:05:10.442630image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0b242abb623afc578575680df30655b9 1
 
< 0.1%
0189da32192f942196b01f50bcebeac8 1
 
< 0.1%
189a841a0a8ba03058526bcfe566aab5 1
 
< 0.1%
83ec1cc84142af6e2acf10c44949e720 1
 
< 0.1%
6d294ed2cc447d2c71c7171a3d54967c 1
 
< 0.1%
fc28024ce480f8ef21a32d64c93a29f5 1
 
< 0.1%
3b9014ea8fb80bd65de0b1463b00b00e 1
 
< 0.1%
d71c95ab6b7356dd74389d41df429c87 1
 
< 0.1%
3c74776e558f1499a7824b556e474b1d 1
 
< 0.1%
c1d9a7ddb1e34639fe82758de97f4abf 1
 
< 0.1%
Other values (1852384) 1852384
> 99.9%

Most occurring characters

ValueCountFrequency (%)
9 3708557
 
6.3%
4 3707696
 
6.3%
7 3707599
 
6.3%
2 3707045
 
6.3%
3 3706132
 
6.3%
1 3705118
 
6.3%
d 3704966
 
6.3%
a 3704452
 
6.2%
8 3704258
 
6.2%
c 3703707
 
6.2%
Other values (6) 22217078
37.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 37055836
62.5%
Lowercase Letter 22220772
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 3708557
10.0%
4 3707696
10.0%
7 3707599
10.0%
2 3707045
10.0%
3 3706132
10.0%
1 3705118
10.0%
8 3704258
10.0%
6 3703446
10.0%
5 3703001
10.0%
0 3702984
10.0%
Lowercase Letter
ValueCountFrequency (%)
d 3704966
16.7%
a 3704452
16.7%
c 3703707
16.7%
f 3703143
16.7%
e 3702587
16.7%
b 3701917
16.7%

Most occurring scripts

ValueCountFrequency (%)
Common 37055836
62.5%
Latin 22220772
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
9 3708557
10.0%
4 3707696
10.0%
7 3707599
10.0%
2 3707045
10.0%
3 3706132
10.0%
1 3705118
10.0%
8 3704258
10.0%
6 3703446
10.0%
5 3703001
10.0%
0 3702984
10.0%
Latin
ValueCountFrequency (%)
d 3704966
16.7%
a 3704452
16.7%
c 3703707
16.7%
f 3703143
16.7%
e 3702587
16.7%
b 3701917
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 59276608
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 3708557
 
6.3%
4 3707696
 
6.3%
7 3707599
 
6.3%
2 3707045
 
6.3%
3 3706132
 
6.3%
1 3705118
 
6.3%
d 3704966
 
6.3%
a 3704452
 
6.2%
8 3704258
 
6.2%
c 3703707
 
6.2%
Other values (6) 22217078
37.5%

unix_time
Real number (ℝ)

Distinct1819583
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3586742 × 109
Minimum1.325376 × 109
Maximum1.3885344 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:10.541754image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1.325376 × 109
5-th percentile1.3300982 × 109
Q11.3430168 × 109
median1.3570893 × 109
Q31.3745815 × 109
95-th percentile1.3867821 × 109
Maximum1.3885344 × 109
Range63158356
Interquartile range (IQR)31564662

Descriptive statistics

Standard deviation18195081
Coefficient of variation (CV)0.013391791
Kurtosis-1.1995793
Mean1.3586742 × 109
Median Absolute Deviation (MAD)15789076
Skewness-0.019735681
Sum2.5168 × 1015
Variance3.3106099 × 1014
MonotonicityNot monotonic
2024-04-09T13:05:10.635235image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1387312599 4
 
< 0.1%
1387468942 4
 
< 0.1%
1381001869 4
 
< 0.1%
1335110521 4
 
< 0.1%
1386957227 4
 
< 0.1%
1370177227 4
 
< 0.1%
1370050667 4
 
< 0.1%
1335729039 3
 
< 0.1%
1339879107 3
 
< 0.1%
1387727516 3
 
< 0.1%
Other values (1819573) 1852357
> 99.9%
ValueCountFrequency (%)
1325376018 1
< 0.1%
1325376044 1
< 0.1%
1325376051 1
< 0.1%
1325376076 1
< 0.1%
1325376186 1
< 0.1%
1325376248 1
< 0.1%
1325376282 1
< 0.1%
1325376308 1
< 0.1%
1325376318 1
< 0.1%
1325376361 1
< 0.1%
ValueCountFrequency (%)
1388534374 1
< 0.1%
1388534364 1
< 0.1%
1388534355 1
< 0.1%
1388534349 1
< 0.1%
1388534347 1
< 0.1%
1388534314 1
< 0.1%
1388534284 1
< 0.1%
1388534276 1
< 0.1%
1388534270 1
< 0.1%
1388534238 1
< 0.1%

merch_lat
Real number (ℝ)

Distinct1754157
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.538976
Minimum19.027422
Maximum67.510267
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:10.739990image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum19.027422
5-th percentile29.753795
Q134.740122
median39.3689
Q341.956263
95-th percentile46.002013
Maximum67.510267
Range48.482845
Interquartile range (IQR)7.2161407

Descriptive statistics

Standard deviation5.1056039
Coefficient of variation (CV)0.13247897
Kurtosis0.77423362
Mean38.538976
Median Absolute Deviation (MAD)3.38992
Skewness-0.1880969
Sum71389368
Variance26.067191
MonotonicityNot monotonic
2024-04-09T13:05:10.835681image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38.363447 4
 
< 0.1%
38.164527 4
 
< 0.1%
36.545289 4
 
< 0.1%
40.822878 4
 
< 0.1%
39.779565 4
 
< 0.1%
43.272992 4
 
< 0.1%
41.829986 4
 
< 0.1%
40.557026 4
 
< 0.1%
39.115059 4
 
< 0.1%
39.818513 4
 
< 0.1%
Other values (1754147) 1852354
> 99.9%
ValueCountFrequency (%)
19.027422 1
< 0.1%
19.027785 1
< 0.1%
19.027804 1
< 0.1%
19.027849 1
< 0.1%
19.029798 1
< 0.1%
19.031242 1
< 0.1%
19.032277 1
< 0.1%
19.032689 1
< 0.1%
19.033288 1
< 0.1%
19.034282 1
< 0.1%
ValueCountFrequency (%)
67.510267 1
< 0.1%
67.441518 1
< 0.1%
67.397018 1
< 0.1%
67.188111 1
< 0.1%
67.064277 1
< 0.1%
66.835174 1
< 0.1%
66.682905 1
< 0.1%
66.679297 1
< 0.1%
66.674714 1
< 0.1%
66.67355 1
< 0.1%

merch_long
Real number (ℝ)

Distinct1809753
Distinct (%)97.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.22794
Minimum-166.67157
Maximum-66.950902
Zeros0
Zeros (%)0.0%
Negative1852394
Negative (%)100.0%
Memory size14.1 MiB
2024-04-09T13:05:11.101225image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum-166.67157
5-th percentile-119.30928
Q1-96.89944
median-87.440694
Q3-80.245108
95-th percentile-73.365169
Maximum-66.950902
Range99.720673
Interquartile range (IQR)16.654332

Descriptive statistics

Standard deviation13.759692
Coefficient of variation (CV)-0.15249924
Kurtosis1.8312584
Mean-90.22794
Median Absolute Deviation (MAD)8.2235005
Skewness-1.143933
Sum-1.6713769 × 108
Variance189.32913
MonotonicityNot monotonic
2024-04-09T13:05:11.194688image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-96.511763 4
 
< 0.1%
-81.995265 4
 
< 0.1%
-74.433003 4
 
< 0.1%
-81.036745 4
 
< 0.1%
-73.900295 4
 
< 0.1%
-95.822621 4
 
< 0.1%
-74.618269 4
 
< 0.1%
-87.830842 4
 
< 0.1%
-80.940524 4
 
< 0.1%
-79.147111 4
 
< 0.1%
Other values (1809743) 1852354
> 99.9%
ValueCountFrequency (%)
-166.671575 1
< 0.1%
-166.671242 1
< 0.1%
-166.670685 1
< 0.1%
-166.670132 1
< 0.1%
-166.670006 1
< 0.1%
-166.66991 1
< 0.1%
-166.669812 1
< 0.1%
-166.669638 1
< 0.1%
-166.666179 1
< 0.1%
-166.664828 1
< 0.1%
ValueCountFrequency (%)
-66.950902 1
< 0.1%
-66.952026 1
< 0.1%
-66.952352 1
< 0.1%
-66.955602 1
< 0.1%
-66.955996 1
< 0.1%
-66.95654 1
< 0.1%
-66.957364 1
< 0.1%
-66.958659 1
< 0.1%
-66.958751 1
< 0.1%
-66.959178 1
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size102.5 MiB
0
1842743 
1
 
9651

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1852394
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

Length

2024-04-09T13:05:11.287881image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-09T13:05:11.364289image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1852394
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common 1852394
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1852394
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1842743
99.5%
1 9651
 
0.5%

amt_month
Real number (ℝ)

Distinct896534
Distinct (%)48.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4153.689
Minimum1
Maximum43261.89
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:11.451548image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile251.28
Q11344.79
median3071.99
Q35738.47
95-th percentile11792.017
Maximum43261.89
Range43260.89
Interquartile range (IQR)4393.68

Descriptive statistics

Standard deviation3909.0054
Coefficient of variation (CV)0.94109246
Kurtosis6.2031988
Mean4153.689
Median Absolute Deviation (MAD)2005.26
Skewness1.9707692
Sum7.6942686 × 109
Variance15280323
MonotonicityNot monotonic
2024-04-09T13:05:11.542437image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.15 15
 
< 0.1%
7.31 15
 
< 0.1%
5.29 15
 
< 0.1%
9.12 14
 
< 0.1%
8.95 14
 
< 0.1%
9.46 14
 
< 0.1%
1.07 14
 
< 0.1%
381.05 13
 
< 0.1%
1016.18 13
 
< 0.1%
251.87 13
 
< 0.1%
Other values (896524) 1852254
> 99.9%
ValueCountFrequency (%)
1 4
 
< 0.1%
1.01 8
< 0.1%
1.02 9
< 0.1%
1.03 7
< 0.1%
1.04 6
< 0.1%
1.05 5
 
< 0.1%
1.06 6
< 0.1%
1.07 14
< 0.1%
1.08 9
< 0.1%
1.09 8
< 0.1%
ValueCountFrequency (%)
43261.89 1
< 0.1%
43055.12 1
< 0.1%
43047.94 1
< 0.1%
43013.27 1
< 0.1%
42923.81 1
< 0.1%
42917.54 1
< 0.1%
42887.02 1
< 0.1%
42841.05 1
< 0.1%
42818.8 1
< 0.1%
42750.39 1
< 0.1%

amt_year
Real number (ℝ)

Distinct1694572
Distinct (%)91.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45305.597
Minimum1.02
Maximum219086.77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:11.646811image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1.02
5-th percentile3283.553
Q117341.423
median37439.105
Q364720.88
95-th percentile115831.99
Maximum219086.77
Range219085.75
Interquartile range (IQR)47379.458

Descriptive statistics

Standard deviation35867.522
Coefficient of variation (CV)0.79167972
Kurtosis1.4120611
Mean45305.597
Median Absolute Deviation (MAD)22656.73
Skewness1.1686746
Sum8.3923817 × 1010
Variance1.2864792 × 109
MonotonicityNot monotonic
2024-04-09T13:05:11.738002image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5468.43 6
 
< 0.1%
33774.13 5
 
< 0.1%
12498.63 5
 
< 0.1%
18523.06 5
 
< 0.1%
8598.52 5
 
< 0.1%
19724.97 5
 
< 0.1%
10775.67 5
 
< 0.1%
43987.25 5
 
< 0.1%
15209.24 5
 
< 0.1%
5724.87 5
 
< 0.1%
Other values (1694562) 1852343
> 99.9%
ValueCountFrequency (%)
1.02 1
< 0.1%
1.03 1
< 0.1%
1.04 1
< 0.1%
1.07 1
< 0.1%
1.08 1
< 0.1%
1.13 2
< 0.1%
1.15 1
< 0.1%
1.19 1
< 0.1%
1.2 2
< 0.1%
1.21 1
< 0.1%
ValueCountFrequency (%)
219086.77 1
< 0.1%
219073.58 1
< 0.1%
219025.18 1
< 0.1%
218957.58 1
< 0.1%
218955.06 1
< 0.1%
218941.76 1
< 0.1%
218866.61 1
< 0.1%
218824.2 1
< 0.1%
218743.75 1
< 0.1%
218713.06 1
< 0.1%

amt_month_shopping_net_spend
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct73861
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean376.2028
Minimum0
Maximum12047.18
Zeros276206
Zeros (%)14.9%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:11.836426image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q19.02
median75.89
Q3425.98
95-th percentile1717.35
Maximum12047.18
Range12047.18
Interquartile range (IQR)416.96

Descriptive statistics

Standard deviation725.35307
Coefficient of variation (CV)1.9280906
Kurtosis23.999749
Mean376.2028
Median Absolute Deviation (MAD)75.89
Skewness4.0224138
Sum6.9687581 × 108
Variance526137.08
MonotonicityNot monotonic
2024-04-09T13:05:11.935702image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 276206
 
14.9%
9.12 575
 
< 0.1%
9.89 528
 
< 0.1%
9.35 475
 
< 0.1%
9.52 469
 
< 0.1%
4.49 465
 
< 0.1%
9.2 451
 
< 0.1%
4.93 448
 
< 0.1%
7.85 436
 
< 0.1%
3.17 434
 
< 0.1%
Other values (73851) 1571907
84.9%
ValueCountFrequency (%)
0 276206
14.9%
1 28
 
< 0.1%
1.01 418
 
< 0.1%
1.02 278
 
< 0.1%
1.03 238
 
< 0.1%
1.04 269
 
< 0.1%
1.05 285
 
< 0.1%
1.06 178
 
< 0.1%
1.07 269
 
< 0.1%
1.08 316
 
< 0.1%
ValueCountFrequency (%)
12047.18 15
< 0.1%
10812.12 3
 
< 0.1%
10805.83 5
 
< 0.1%
10796.17 28
< 0.1%
10790.23 3
 
< 0.1%
10339.78 2
 
< 0.1%
10245.7 11
 
< 0.1%
10242.58 1
 
< 0.1%
10238.88 12
< 0.1%
10235.86 12
< 0.1%

count_month_shopping_net
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5672411
Minimum0
Maximum48
Zeros276206
Zeros (%)14.9%
Negative0
Negative (%)0.0%
Memory size14.1 MiB
2024-04-09T13:05:12.051883image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q37
95-th percentile14
Maximum48
Range48
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.5755024
Coefficient of variation (CV)1.0018088
Kurtosis4.3978581
Mean4.5672411
Median Absolute Deviation (MAD)2
Skewness1.7306414
Sum8460330
Variance20.935222
MonotonicityNot monotonic
2024-04-09T13:05:12.151664image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
0 276206
14.9%
1 268131
14.5%
2 227701
12.3%
3 196162
10.6%
4 161418
8.7%
5 135485
7.3%
6 114389
6.2%
7 94026
 
5.1%
8 76806
 
4.1%
9 62216
 
3.4%
Other values (39) 239854
12.9%
ValueCountFrequency (%)
0 276206
14.9%
1 268131
14.5%
2 227701
12.3%
3 196162
10.6%
4 161418
8.7%
5 135485
7.3%
6 114389
6.2%
7 94026
 
5.1%
8 76806
 
4.1%
9 62216
 
3.4%
ValueCountFrequency (%)
48 9
 
< 0.1%
47 8
 
< 0.1%
46 6
 
< 0.1%
45 6
 
< 0.1%
44 7
 
< 0.1%
43 7
 
< 0.1%
42 20
 
< 0.1%
41 33
< 0.1%
40 53
< 0.1%
39 40
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
False
1323066 
True
529328 
ValueCountFrequency (%)
False 1323066
71.4%
True 529328
28.6%
2024-04-09T13:05:12.247393image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Interactions

2024-04-09T13:04:44.421011image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:43.363949image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:48.620234image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:53.626335image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:58.719795image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:03.841021image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:08.930343image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:13.892818image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:18.935582image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:24.048458image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:29.127535image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:34.248176image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:39.300886image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:44.790758image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:43.763544image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:48.996256image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:53.985601image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:59.104775image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:04.271439image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:09.340847image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:14.304947image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:19.351496image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:24.439730image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:29.495658image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:34.620668image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:39.674460image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:45.236774image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:44.193085image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:49.394530image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:54.390901image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:59.492415image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:04.644626image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:09.714418image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:14.667417image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:19.733078image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:24.836312image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:29.866864image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:35.013123image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:40.106440image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:45.601500image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:44.585758image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:49.753925image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:54.759854image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:59.872010image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:05.039340image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:10.121968image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:15.083385image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:20.155613image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:25.251223image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:30.289449image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:35.400136image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:40.470975image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:45.986734image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:44.964950image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:50.170288image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:55.194665image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:00.307278image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:05.423597image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:10.485796image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:15.442200image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:20.531549image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:25.620075image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:30.655114image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:35.760747image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:40.848979image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:46.392117image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:45.381192image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:50.529723image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:55.567654image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:00.673128image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:05.805606image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:10.854306image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:15.807452image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:20.907112image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:25.993602image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:31.066402image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:36.181200image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:41.262150image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:46.773152image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:45.795055image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:50.920911image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:55.941787image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:01.116603image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:06.234772image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:11.273660image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:16.235963image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:21.326431image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:26.388203image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:31.456189image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:36.559305image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:41.624547image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:47.195604image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:46.257205image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:51.336927image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:56.368818image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:01.506278image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:06.611287image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:11.634636image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:16.603043image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:21.688182image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:26.774472image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:31.838447image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:36.943093image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:42.047806image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:47.551862image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:46.650300image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:51.686682image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:56.747360image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:01.884781image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:06.978111image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:12.023288image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:16.990658image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:22.100931image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:27.177024image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:32.255003image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:37.356915image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:42.429376image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:47.921322image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:47.032635image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:52.085268image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:57.175035image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:02.307400image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:07.380972image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:12.395249image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:17.381518image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:22.475873image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:27.537916image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:32.616587image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:37.751075image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:42.821938image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:48.344846image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:47.412939image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:52.470323image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:57.554753image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:02.670510image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:07.740891image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:12.747820image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:17.736399image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:22.842465image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:27.908421image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:33.064173image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:38.162501image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:43.236636image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:48.717686image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:47.794822image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:52.823165image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:57.917906image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:03.078485image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:08.173809image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:13.158451image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:18.175467image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:23.280025image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:28.323915image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:33.451583image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:38.522487image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:43.594586image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:49.148394image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:48.236820image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:53.252917image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:03:58.339236image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:03.469989image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:08.551539image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:13.514750image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:18.546617image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:23.652770image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:28.694363image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:33.825088image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:38.897336image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2024-04-09T13:04:44.027747image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2024-04-09T13:05:12.324898image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
cc_numamtziplatlongcity_popunix_timemerch_latmerch_longamt_monthamt_yearamt_month_shopping_net_spendcount_month_shopping_netcategorygenderstateis_fraudfirst_time_at_merchant
cc_num1.000-0.0010.013-0.003-0.0130.0490.001-0.003-0.0130.0080.0110.0100.0170.0080.0520.2380.0030.019
amt-0.0011.0000.0010.013-0.000-0.024-0.0010.013-0.0000.0610.0370.081-0.0100.0190.0010.0030.0000.005
zip0.0130.0011.000-0.162-0.959-0.0400.001-0.162-0.9570.0170.0170.0250.0290.0110.1160.9630.0040.025
lat-0.0030.013-0.1621.0000.105-0.2640.0010.9910.104-0.012-0.012-0.005-0.0090.0100.1010.7480.0380.027
long-0.013-0.000-0.9590.1051.0000.087-0.0010.1050.998-0.011-0.010-0.017-0.0260.0090.0910.8630.0380.016
city_pop0.049-0.024-0.040-0.2640.0871.000-0.003-0.2630.0860.0130.020-0.002-0.0220.0140.0900.3130.0020.024
unix_time0.001-0.0010.0010.001-0.001-0.0031.0000.001-0.0010.1290.3960.0840.1170.0010.0000.0030.0220.509
merch_lat-0.0030.013-0.1620.9910.105-0.2630.0011.0000.104-0.012-0.012-0.004-0.0090.0110.1030.7590.0380.020
merch_long-0.013-0.000-0.9570.1040.9980.086-0.0010.1041.000-0.011-0.010-0.017-0.0260.0090.0830.8290.0380.013
amt_month0.0080.0610.017-0.012-0.0110.0130.129-0.012-0.0111.0000.4720.7320.8210.0160.1060.0550.0300.167
amt_year0.0110.0370.017-0.012-0.0100.0200.396-0.012-0.0100.4721.0000.3480.4220.0200.1460.0830.0360.351
amt_month_shopping_net_spend0.0100.0810.025-0.005-0.017-0.0020.084-0.004-0.0170.7320.3481.0000.7770.0140.0910.0510.0910.065
count_month_shopping_net0.017-0.0100.029-0.009-0.026-0.0220.117-0.009-0.0260.8210.4220.7771.0000.0230.1250.0550.0140.169
category0.0080.0190.0110.0100.0090.0140.0010.0110.0090.0160.0200.0140.0231.0000.0540.0190.0670.139
gender0.0520.0010.1160.1010.0910.0900.0000.1030.0830.1060.1460.0910.1250.0541.0000.2560.0060.044
state0.2380.0030.9630.7480.8630.3130.0030.7590.8290.0550.0830.0510.0550.0190.2561.0000.0330.058
is_fraud0.0030.0000.0040.0380.0380.0020.0220.0380.0380.0300.0360.0910.0140.0670.0060.0331.0000.028
first_time_at_merchant0.0190.0050.0250.0270.0160.0240.5090.0200.0130.1670.3510.0650.1690.1390.0440.0580.0281.000

Missing values

2024-04-09T13:04:52.083248image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-09T13:04:55.521940image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraudamt_monthamt_yearamt_month_shopping_net_spendcount_month_shopping_netfirst_time_at_merchant
02019-01-01 00:00:182703186189652095fraud_Rippin, Kub and Mannmisc_net4.97JenniferBanksF561 Perry CoveMoravian FallsNC2865436.0788-81.17813495Psychologist, counselling1988-03-090b242abb623afc578575680df30655b9132537601836.011293-82.04831504.974.970.00.0True
12019-01-01 00:00:44630423337322fraud_Heller, Gutmann and Ziemegrocery_pos107.23StephanieGillF43039 Riley Greens Suite 393OrientWA9916048.8878-118.2105149Special educational needs teacher1978-06-211f76529f8574734946361c461b024d99132537604449.159047-118.1864620107.23107.230.00.0True
22019-01-01 00:00:5138859492057661fraud_Lind-Buckridgeentertainment220.11EdwardSanchezM594 White Dale Suite 530Malad CityID8325242.1808-112.26204154Nature conservation officer1962-01-19a1a22d70485983eac12b5b88dad1cf95132537605143.150704-112.1544810220.11220.110.00.0True
32019-01-01 00:01:163534093764340240fraud_Kutch, Hermiston and Farrellgas_transport45.00JeremyWhiteM9443 Cynthia Court Apt. 038BoulderMT5963246.2306-112.11381939Patent attorney1967-01-126b849c168bdad6f867558c3793159a81132537607647.034331-112.561071045.0045.000.00.0True
42019-01-01 00:03:06375534208663984fraud_Keeling-Cristmisc_pos41.96TylerGarciaM408 Bradley RestDoe HillVA2443338.4207-79.462999Dance movement psychotherapist1986-03-28a41d7549acf90789359a9aa5346dcb46132537618638.674999-78.632459041.9641.960.00.0True
52019-01-01 00:04:084767265376804500fraud_Stroman, Hudson and Erdmangas_transport94.63JenniferConnerF4655 David IslandDublinPA1891740.3750-75.20452158Transport planner1961-06-19189a841a0a8ba03058526bcfe566aab5132537624840.653382-76.152667094.6394.630.00.0True
62019-01-01 00:04:4230074693890476fraud_Rowe-Vandervortgrocery_net44.54KelseyRichardsF889 Sarah Station Suite 624HolcombKS6785137.9931-100.98932691Arboriculturist1993-08-1683ec1cc84142af6e2acf10c44949e720132537628237.162705-100.153370044.5444.540.00.0True
72019-01-01 00:05:086011360759745864fraud_Corwin-Collinsgas_transport71.65StevenWilliamsM231 Flores Pass Suite 720EdinburgVA2282438.8432-78.60036018Designer, multimedia1947-08-216d294ed2cc447d2c71c7171a3d54967c132537630838.948089-78.540296071.6571.650.00.0True
82019-01-01 00:05:184922710831011201fraud_Herzog Ltdmisc_pos4.27HeatherChaseF6888 Hicks Stream Suite 954ManorPA1566540.3359-79.66071472Public affairs consultant1941-03-07fc28024ce480f8ef21a32d64c93a29f5132537631840.351813-79.95814604.274.270.00.0True
92019-01-01 00:06:012720830304681674fraud_Schoen, Kuphal and Nitzschegrocery_pos198.39MelissaAguilarF21326 Taylor Squares Suite 708ClarksvilleTN3704036.5220-87.3490151785Pathologist1974-03-283b9014ea8fb80bd65de0b1463b00b00e132537636137.179198-87.4853810198.39198.390.00.0True
trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraudamt_monthamt_yearamt_month_shopping_net_spendcount_month_shopping_netfirst_time_at_merchant
18523842020-12-31 23:57:1830344654314976fraud_Larkin, Stracke and Greenfelderentertainment46.71ChristineJohnsonF8011 Chapman Tunnel Apt. 568Blairsden-GraeagleCA9610339.8127-120.64051725Chartered legal executive (England and Wales)1967-05-27a7105564935ea3977dc61ff9ced3bf5e138853423838.963543-120.45712107420.5741336.211706.739.0False
18523852020-12-31 23:57:503524574586339330fraud_Heathcote, Yost and Kertzmannshopping_net29.56AshleyCabreraF94225 Smith Springs Apt. 617Vero BeachFL3296027.6330-80.4031105638Librarian, public1986-05-079fc9f6f9be3182d519a61a119cf97199138853427027.593881-80.855092014501.2899329.662299.3220.0False
18523862020-12-31 23:57:56341546199006537fraud_Schmidt-Larkinhome12.68MarkBrownM8580 Moore CoveWalesAK9978364.7556-165.6723145Administrator, education1939-11-09a8310343c189e4a5b6316050d2d6b014138853427665.623593-165.18603308706.2367211.9868.225.0False
18523872020-12-31 23:58:04501802953619fraud_Pouros, Walker and Spencerkids_pets13.02RobertFloresM3277 Fields Meadows Apt. 790GreenviewCA9603741.5403-122.9366308Call centre manager1958-09-20bd7071fd5c9510a5594ee196368ac80e138853428441.973127-123.55303209016.4365502.891161.1117.0False
18523882020-12-31 23:58:343523843138706408fraud_Prosacco, Kreiger and Kovacekhome17.00GraceWilliamsF28812 Charles Mill Apt. 628PlantersvilleAL3675832.6176-86.94751412Drilling engineer1970-11-206d04313bfe4b661b8ca2b6a499a320fe138853431432.164145-87.539669013874.1978212.661393.0615.0False
18523892020-12-31 23:59:0730560609640617fraud_Reilly and Sonshealth_fitness43.77MichaelOlsonM558 Michael EstatesLurayMO6345340.4931-91.8912519Town planner1966-02-139b1f753c79894c9f4b71f04581835ada138853434739.946837-91.333331011619.6372134.231014.4411.0False
18523902020-12-31 23:59:093556613125071656fraud_Hoppe-Parisiankids_pets111.84JoseVasquezM572 Davis MountainsLake JacksonTX7756629.0393-95.440128739Futures trader1999-12-272090647dac2c89a1d86c514c427f5b91138853434929.661049-96.186633015224.4787115.433942.7825.0False
18523912020-12-31 23:59:156011724471098086fraud_Rau-Robelkids_pets86.88AnnLawsonF144 Evans Islands Apt. 683BurbankWA9932346.1966-118.90173684Musician1981-11-296c5b7c8add471975aa0fec023b2e8408138853435546.658340-119.715054026233.12165389.302978.9129.0False
18523922020-12-31 23:59:244079773899158fraud_Breitenberg LLCtravel7.99EricPrestonM7020 Doyle Stream Apt. 951MesaID8364344.6255-116.4493129Cartographer1965-12-1514392d723bb7737606b2700ac791b7aa138853436444.470525-117.080888011787.7190698.65768.6917.0False
18523932020-12-31 23:59:344170689372027579fraud_Dare-Marvinentertainment38.13SamuelFreyM830 Myers Plaza Apt. 384EdmondOK7303435.6665-97.4798116001Media buyer1993-05-101765bb45b3aa3224b4cdcb6e7a96cee3138853437436.210097-97.036372013871.45116400.29883.3118.0False